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© Albumin-based nucleotides, their replication and use, and plasmids for use therein. 

© The DNA sequence coding for human serum albumin 
has been isolated and inserted as two fragments into two 
novel plasmids which can be replicated in £. coll. These 
novel fragments can be joined to provide a unitary DNA 
sequence which then can be cloned into a suitable host e.g. 
£. coli, for the expression of human serum albumin (which is 
used extensively in medical practice in treating shock 
conditions). 
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UPJOHN 1 GJE 70/2056/02 

ALBUMIN-BASED NUCLEOTIDES , THEIR REPLICATION 
AND USE, AND PLASMIDS FOR USE THEREIN 

This invention relates to nucleotides related to 
human serum albumin (HSA) , their replication and use, 
and plasmids (and host substances) for use therein* 
The gene for serum albumin is regulated in 
5 development . On the other hand, serum albumin is synthe- 
sised in mammals by the adult liver , and its plateau in 
adulthood. The embryonic liver and yolk sac, on the 
other hand, produce predominantly a -fetoprotein, but the 
synthesis decreases drastically after birth. Recently, 
10 Law et al determined the complete sequence of mouse 
a-fetoprotein mRNA, Nature 291 (1981) 201-205. The 
structure revealed extensive homology to mammalian serum 
albumin, indicating that the two proteins are encoded 
in the same gene family. Similar conclusions have been 
15 reached from studies on the a-fetoprotein genes of the 
rat and the mouse; see Jagodzinski et al, Proc. Natl. 
Acad. Sci. USA, 78 (1981) 3521-3525, and Gorin et al, 
J. Biol. Chem. 256 (1981) 1954-1959. 

The complete nucleotide sequence of human serum 
20 mRNA has been determined from recombinant cDNA clones and 
from a primer-extended cDNA synthesis on the mRNA 
template. The sequence comprises 2,078 nucleotides 
starting upstream of a potential ribosome binding site 
in the 5 '-untranslated region. It contains all the 
25 translated codons and extends into the poly (A) at the 

3' -terminus. Part of the translated sequence codes for a 
hydrophobic prepeptide met-lys-trp-val-thr-phe-ile-ser- 
leu-leu-phe-leu-phe-ser-ser-ala-tyr-ser, followed by a 
basic propeptide arg-gly-val-phe-arg-arg. These signal 
30 peptides are absent from mature serum albumin and, so 
far, have not been identified in their nascent state in 
humans. A remaining 1,755 nucleotides of the translated 
mRNA sequence code for 585 amino acids which are in 
agreement, with few exceptions , with the published amino 
35 acid data for human serum albumin. The mRNA sequence 
verifies and refines the repeating homology in the triple- 
domain structure of the serum albumin molecule. 
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DCTAILCD DESCRIPTION Or THC IHVCHT 43N- 

Human serum albumin cDNA 1s cloned into the PstI site of plasmid 
pBR322 by the oligo(dG)-oligo(dC) tailing technique. Plasmid DNA was 
isolated from 97 positive colonies which hybridized to the enriched 
albumin cDNA probe, and the recombinant plasmid pHA36 was found to 
contain the largest insert of an albumin cDNA sequence. Its restric- 
tion endonuclease map is shown in the drawing, together with a re- 
striction map of the primer-extended plasmid done pHA206. The latter 
was obtained 1n a second transformation experiment after Initiating 
the cDNA synthesis from an internal primer. This primer was a 91 base 
pairs long DNA fragment, Hspl ( 152 )-TaqI (182/3 ), isolated from pHA36. 
The two plasmids, pHA36 and pHA206, share 0.15 kb of homologous DNA. 
Together, they encode the entire sequence for human serum albumin, 
starting with the CTT codon for leu -10 of the propeptide and extend- 
ing into the 3'-untranslated region of poly(A). 

Sequence o f the Albumin cDNA . The sequence was determined for the 
most part on both DNA strands to ensure accuracy. All of the restric- 
tion sites used to end- label DNA fragments were sequenced across by 
labeling a neighboring restriction site. The entire nucleotide 
sequence of the serum albumin mRNA, as determined from the cloned DNA 
in pHA36, pHA206, and from the primer-extended cDNA at the 5'-terminus 
of the message, is shown in the following Table 1. The inferred amino 
acid sequence is also indicated. The mRNA length is 2,078 nucleo- 
tides, of which 38 represent the 5' -untranslated region, 54 identify a 
propeptide of 18 amino acids, 18 identify a propeptide of 6 amino 
acids, 1,755 code for the known 585 amino acids of serum albumin, 189 
make up the 3 '-untranslated region and 24 are the poly(A) sequence. 
Nucleotides 5 to 15 (-34 to -24) in the 5 '-untranslated region (Table 
1) are complementary to a 3'-terminal region of eukaryotic 18S RNA 
[Azad, A.A. and Deacon, N.J. (1980) Nucl. Acids Res. 8, 4365-43763 and 
thus could represent a ribosome binding site: 

(5')...T T C T C T T C T 6 T albumin mRNA . 

(3')...G AGGAAGGGGUCCm*A m^A 18S RNA 



The translated portion of the mRNA sequence codes for the signal 
peptide and the main body of the albumin polypeptide chain. The 
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signal peptide is composed of a hydrophobic prepeptide of 18 amino 
acids and a basic propeptide of 6 amino acids (Table 1). Since pre- 
peptides are removed from nascent secretory proteins (like albumin) in 
the endoplasmic reticulum, they are seen only in vitro in heterologous 
5 translation systems. As yet, they have not been found within cells 
[Judah, J.D. and Quinn, P.S. (1977) FEBS 11th Mtg., Copenhagen 50, 
21-29; and Strauss, A.W., Donohue, A.M., Bennett, CO., Rodkey, O.A. 
and Alberts, A.W. (1977) Proc. Natl. Acad. Sci. USA 74, 1358-13621. 
This is the first report of the presence and the sequence of a pre- 
10 peptide for human serum albumin. As it is with other secretory pro- 
teins, the conversion of proalbumin to albumin takes place in the 
Golgi vesicles, and the enzyme responsible for this cleavage is 
probably cathepsin B [Judah, J.D. and Quinn, P.S. (1978) Nature 271, 
384-385]. This is also a first report on the sequence of the pro- 
15 peptide for normal human serum albumin. 

At the 3' -end of the message, the putative polyadenylation signal 
sequence, AATAAA, is located 164 nucleotides downstream from the amino 
acid termination codon TAA and 16 nucleotides upstream from the 
beginning of the poly (A) sequence. Another characteristic sequence 
20 located near the polyadenylation site has been identified by Renoist, 
et al. [Benoist, C, O'Hare, K. , Breathnach, R. and Chambon, P. (1980) 
Nucl. Acids Res. 8, 127-142]; the concensus sequence from several 
mRNAs was concluded as TTTTCACTGC. A similar sequence, TTTTCTCTGT, is 
located 19 nucleotides upstream from the AATAAA hexanucleotide in the 
25 human albumin mRNA (Table 1). 
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Following are examples which illustrate procedures! including tho 
best modo , for practicing the invention. These examples should not be 
construed as limiting. All percentages are by weight and all solvent 
mixture proportions are by volume unless otherwise noted. 
5 Example 1 Isolation of Messenger RNA 

Human liver mRNA was obtained following the procedure of 
Chirgwin, et al [Chirgwin, J.M., Przybyla, A.E., MacDonald, R.J. and 
Rutter, W.J. (1979) Biochemistry 18, 5294-5299]. Immunoprecipitation 
of albumin containing polysomes was performed according to Taylor and 
10 Tse [Taylor, J.M. and Tse, T.P.H. (1976) J. Biol. Chem. 251, 7461- 
7467]. In vitro translation of mRNA was carried out in a reticulocyte 
cell -free system, following the instruction of the manufacturer (New 
England Nuclear). The translation products were separated electro- 
phoretically according to Laemmli [Laemmli, J.K. (1970) Nature 227, 
15 680-685. 

Example 2 Cloning Procedures 

Double stranded cDNA was synthesized as described previously 
[Law, S., Tamaoki, T., Kreuzaler, F. and Dugaiczyk, A. (1980) Gene 10, 
53-61]. It was annealed to PstI -linearized pBR322 DNA [Bolivar, F., 
20 Rodriguez, R.L., Greene, P.J., Betlach, M.C., Heyneker, H.L., Boyer, 
H.W., Crossa, J.H. and Falkow, S. (1977) Gene 2, 95-113] that had been 
tailed with 15 dG residues/3 '-terminus [Dugaiczyk, A., Robberson, D.L. 
and Ullrich, A. (1980) Biochemistry 19, 5869-5873]. The annealed DNA 
was used to transform E. coli strain RR1, as detailed previously [Law, 
25 S. , et al., Ibid. ]. The albumin clones were selected using the colony 
hybridization method of Grunstein and Hogness [Grunstein, M. and 
Hogness, D.S. (1975) Proc. Natl. Acad. Sci. USA 72, 3961-39651, with 
-labeled cDNA synthesized with the immunoprecipitated polysomal 
mRNA as template. 

30 As shown in Example 5, plasmids pHA36 and pHA206 were deposited 

in E. coli HB101 hosts. The plasmids were obtained from E. coli RR1 
hosts, described in this example, and transformed into E. coli HB101 
by standard procedures well known to those of ordinary skill in this 
art. The coli RR1 hosts were lysed and then centrifuged to 

35 separate the chromosomal DNA, cell DNA and plasmid DNA. The plasmid 
DNA, remaining in the supernatant, is precipitated with ethanol and 
the precipitate is resuspended in buffer, e.g., TCM (lOmM Tris*HCl, pH 
8.0, 10 mM CaCl 2 . 10 mM MgCl 2 ). The cells for transformation are 
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prepared as follows: 120 ml of L-broth (1% tryptone, 0.5% yeast 
extract, 0.5% NaCl) are inoculated with an 18 hour culture of HB101 
NRRL B-11371 and grown to an optical density of 0.6 at 600 nm. Cells 
are washed in cold 100 mM NaCl and resuspended for 15 minutes in 20 ml 
5 chilled 50 mM CaCl2- Bacteria are then concentrated to one-tenth of 
this volume in CaCl 2 and mixed 2:1 (v:v) with annealed plasmid DNA, 
prepared as described above. After chilling the cell-DNA mixture for 
15 minutes, it is heat shocked at 42°C for 2 minutes, then allowed to 
equilibrate at room temperature for ten minutes before addition of 
10 L-broth 10 times the volume of the cell-DNA suspension. Transformed 
cells are incubated in broth at 37°C for one hour before inoculating 
selective media (L-agar plus 10 pg/ml tetracycline) with 200 v l /pi ate. 
Plates are incubated at 37°C for 48 hours to allow the growth of 
transformants. 

15 Example 3 Happing of Restriction Endonuclease Sites 

Restriction endonucl eases were obtained from Bethesda Research 
Laboratories and New England Biolabs and were used according to the 
manufacturers' instructions. The digested DNA fragments were analyzed 
electrophoretically on agarose [Helling, R.R., Goodman, H.M. and 

20 Boyer, H.W. (1974) J. Virol. 14, 1235-1244] or acrylamide [Dingman, 
C, Fisher, M.P. and Kakefuda, T. (1972) Biochemistry 11, 1242-12501 
gels. 

Example 4 DNA Sequencing 

DNA fragments were dephosphoryl ated with bacterial alkaline 

25 phosphatase (Worthington) and labeled at the 5'-ends with poly- 
nucleotide kinase (Boehringer-Hannheim) and y[ 32p ]ATP. Following 
digestion with a second restriction endonuclease and electrophoretic 
separation of the fragments, DNA sequence determination was done 
according to the procedure of Maxam and Gilbert [Maxam, A. and 

30 Gilbert, W. (1980) Methods Enzym. 65, 499-560] and the degradation 
products were separated electrophoretically on 0.4 mm acrylamide gels 
as described by Sanger and Coulson [Sanger, F. and Coulson, R. (1978) 
FEBS Letters 87, 107-110]. 

Example 5 Recombinant Plasmids pHA36 and pHA206 

35 As disclosed in Example 2, albumin clones were selected by 

hybridizing to the enriched albumin cDNA probe. Plasmid pHA36 con- 
tained the largest insert of an albumin cDNA sequence. Both plasmids 
pHA36 and pHA206 have been deposited in a viable E. coli host in the 
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permanent collection of the Northern Regional Research Laboratory 
(NRRL), U.S. Department of Agriculture, Peoria, Illinois, U.S.A. 
Their accession numbers in this repository are as follows: 
HB101(pHA36) - NRRL B-12551 
5 HB101(pHA206) - NRRL B-12550 

E. coli HB101 is a known and widely available host microbe. Its 
NRRL accession number is NRRL B-11371. 

NRRL B-12550 and NRRL B-12551 are available to the public. -wpw 
tho grant of a patent . — It should bo understood that the ovoil ability 
10 of these deposits does not constitute a license to practice the sub- 
ject invention in derogation of patent rights granted with the subject 

instrum e nt by g o vsrnm e nt a l a ctiAn . ; 

E. coli RR1 and E. coli HB101 are known and widely available host 
microbes. Their NRRL accession numbers are NRRL B-12186 and NRRL 
15 B-11371, respectively. 

pBR322 is a well known and widely available plasmid. It can be 
obtained from the following host deposit by standard procedures: 
NRRL B-12014 - E. coli RR1 (pBR322). 
YEp6 is a well known and widely available yeast episomal plasmid. 
20 It can be obtained from the following host deposit by standard 
procedures: 

E. coli HB101 (YEp6) - NRRL B-12093. 
Example 6 Assembly of the Serum Albumin Gene 

Assembling the pieces together is a straighforward task of re- 
25 striction enzymology. There is only one Mspl site in the overlapping 
DNA sequence of the two cDNA clones. Two enzymatic steps of (1) Mspl 
digestion of the two DNAs, followed by (ii) the use of ligase, an 
enzyme that seals DNA fragments, will give the desired product. 
Although two other undesired DNA species will also be obtained in the 
30 course of this recombination reaction, both of them will differ sub- 
stantially in size. Thus, separation and isolation of the desired DNA 
species will be achieved. 

The assembled DNA clone can be used to transform two types of 
cells: 

35 (a) Escherichia coli 

(b) Saccharomyces cerevisiae 



(a) The vector of choice is plasmid pBR322, the same that has 
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been successfully used for cloning of the two fragmented pieces of the 
serum albumin cDNA. 

(b) In order to transform yeast with the serum albumin 
structural gene sequence, the DNA must be inserted into one of the 
5 existing yeast plasmid vectors. This can be accomplished by talcing 
advantage of the fact that several restriction endonuclease recog- 
nition sequences are absent from the cloned serum albumin ONA. Syn- 
thetic EcoR l DNA linkers can be ligated to the DNA fragment containing 
the serum albumin sequence followed by insertion (ligation) into one 

10 of the yeast plasmid vectors, e.g., YEp6, at the Eco Rl cloning site. 
The fused chimeric plasmid can be used to transform yeast according to 
an established procedure [Hinnen, A., Hicks, O.B. and Fink, 6.R. 
(1978) Proc. Natl. Acad. Sci. USA . 75, 19291. YEp6 can be obtained 
from the NRRL repository, as disclosed supra. 

15 Example 7 Expression of the Serum Albumin Gene 

The main body of the structural gene will be transcribed by the 
E. coli or yeast enzymes. If little or no albumin is produced with 
the selected host, then an Escherichia coli promoter DNA sequence 
carrying an initiation codon, I.e., ATG, can be ligated at the begin- 

20 ning of the serum albumin structural gene. Such elements are known 
and available, e.g., lac promoter used for the expression of human 
interferon gene in E. coli [Proc. Natl. Acad. Sci. 77, 5230 (1980)]; 
source of promoter DNA [Proc. Natl. Acad. Sci. 76, 760 (1979)1. Also, 
see Nature, Vol. 281, October 18, 1979. It has already been 

25 documented that such Escherichia coli promoter sequences function well 
in the expression of foreign genes in Escherichia coli [Hercereau- 
Puijalon, 0., Royal, A., Cami, 8., Garapin, A., Krust, A., Gannon, I. 
and Kourilsky, P. (1978) Nature 275, 505; and Goeddel , D.V., Kleid, 
D.G., Bolivar, F. , Heyneker, H.L., Yansura, D.G., Grea, R., Hirose, 

30 T., Kraszewski, A., Itakura, K., and Riggs, A. (1979) Natl. Acad. Sci. 
USA 76, 106]. For expression in yeast, see Rose, M. , Casadaban, M.J. 
and Botstein, D. (1981) Proc. Natl. Acad. Sci. USA 78, 2460 and 4466. 
Example 8 Screening of Clones Producing Albumin 

Immunological methods can be used to detect small amounts of 

35 albumin made in a bacterium. Flat disks of flexible polyvinyl are 
coated with the IgG fraction from an immune serum and the disks are 
pressed onto an agar plate so that antigen released from an in situ 
lysed microbial colony can bind to the fixed antibody. The plastic 
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disk 1s then incubated with the same total IgG fraction labeled with 
radioactive iodine so that other determinants on the bound antigen can 
in turn bind the iodinated antibody. Radioactive areas on the disk 
expose X-ray film during autoradiography and thus identify colonies 

5 producing the protein which is being screened for. Detailed protocols 
of this procedure have been published [Broome, S. and Gilbert, W. 
(1978) Proc. Natl. Acad. Sci. USA , 75, 2746]. The purification of 
human serum albumin can be accomplished by using procedures well known 
1n the art. For example, procedures disclosed 1n a chapter by T. 

1Q Peters: Purification and Properties of Serum Albumin, 1n: The Plasma 
Proteins, Putnam, Ed. Academic Press, New York, 1975, can be used. 

The work described herein was all done in conformity with 
physical and biological containment requirements specified in the NIH 
Guidelines. 
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1. Plasmid pHA36, having a restriction endonuclease pattern as 
shown in the drawing. 

5 

2. Plasmid pHA206, having a restriction endonuclease pattern as 
shown in the drawing. 

3 * JL co1i HB101 (pHA36) having the deposit accession number 
10 NRRL B-12551. 

4. coli HB101 (pHA206) having the deposit accession number 
NRRL B-12550. 

15 5. A microorganism modified to contain a nucleotide sequence 

coding for the amino acid sequence of human serum albumin; said 
nucleotide sequence is as follows: 

20 
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6. Nucleotide sequence of the cDNA of human serum albumin, said 
nucleotide sequence is as follows: 
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a lbJn NU ^ e ° t1 1 de Sequence codin 9 the prepeptide of human serum 
albumin, said nucleotide sequence 1s as follows: 
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8. Nucleotide sequence coding for pro human serum albumin, said 
nucleotide sequence is as follows: 
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9. Nucleotide sequence coding for the pre pro human serum 
albumin, said nucleotide sequence 1s as follows: 
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10. A nucleotide sequence according to any of claims 
6 to 9, in essentially pure form. 

11. A DNA transfer vector comprising a nucleotide 
sequence as defined in claim 5. 

5 12. A DNA transfer vector according to claim 11, 
transferred to and replicated in a micro-organism. 

13. A DNA transfer vector according to claim 12 , 
which is a plasmid. 

14. A DNA transfer vector according to claim 13 , 
ID wherein the plasmid is pBR322 or YEp6. 

15. A process for preparing human serum albumin, 
which comprises culturing a micro-organism according 
to claim 5. 

16. A DNA transfer vector according to any of 

15 claims 12 to 14, or a process according to claim 15, 
wherein the micro-organism is a bacterium or yeast. 

17. A vector or process according to claim 16, 
wherein the bacterium or yeast is coli or 'Saccharomyces 
cerevisiae. 
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